Google Account
Harshita Sharma
sharmaharshita7027@gmail.com
Commands Code Text
Notebook
Code Text

Gemini

ECOMMERCE SALES ANALYSIS

  • An Ecommerce sale analysis is the process of examining data from online transactions to understand how well an ecommerce business is performing—and how it can improve. It’s like putting your digital store under a microscope to uncover what’s working, what’s not, and where the hidden opportunities lie

Untitled design (4).png

Code Text

Gemini

IMPORT LIBRARIES

Code Text

Gemini
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Code Text

Gemini

LOAD DATASET

Code Text

Gemini
df=pd.read_excel("Ecommerce Sales Analysis.xlsx")
Code Text

Gemini
df
Code Text

Gemini

ANALYSIS THE DATA

Code Text

Gemini
df.head() # for top rows
Code Text

Gemini
df.tail() # for last rows
Code Text

Gemini
df.info() # information regarding the data
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9994 entries, 0 to 9993
Data columns (total 22 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Row ID         9994 non-null   int64         
 1   Order ID       9994 non-null   object        
 2   Year           9994 non-null   int64         
 3   Order Date     9994 non-null   datetime64[ns]
 4   Ship Date      9994 non-null   datetime64[ns]
 5   Ship Mode      9994 non-null   object        
 6   Customer ID    9994 non-null   object        
 7   Customer Name  9994 non-null   object        
 8   Segment        9994 non-null   object        
 9   Country        9994 non-null   object        
 10  City           9994 non-null   object        
 11  State          9994 non-null   object        
 12  Postal Code    9994 non-null   int64         
 13  Region         9994 non-null   object        
 14  Product ID     9994 non-null   object        
 15  Category       9994 non-null   object        
 16  Sub-Category   9994 non-null   object        
 17  Product Name   9994 non-null   object        
 18  Sales          9994 non-null   float64       
 19  Quantity       9994 non-null   int64         
 20  Discount       9994 non-null   float64       
 21  Profit         9994 non-null   float64       
dtypes: datetime64[ns](2), float64(3), int64(4), object(13)
memory usage: 1.7+ MB
Code Text

Gemini

STATISTICAL SUMMARY

Code Text

Gemini
Code Text

Gemini
df.describe().transpose()
Code Text

Gemini

CHECK FOR DUPLICATION

Code Text

Gemini
df.nunique()
Code Text

Gemini
np.int64(0)
Code Text

Gemini

MISSING VALUES

Code Text

Gemini
df.isnull().sum()
Code Text

Gemini
(df.isnull().sum()/(len(df)))*100
Code Text

Gemini

DATA REDUCTION

Code Text

Gemini
df_clean = df.dropna()
df_clean
Code Text

Gemini

DATA TYPE

Code Text

Gemini
df.dtypes
Code Text

Gemini

DATA CLEANING

Code Text

Gemini

COLUMN NAME

Code Text

Gemini
df.columns.tolist()
['Row ID',
 'Order ID',
 'Year',
 'Order Date',
 'Ship Date',
 'Ship Mode',
 'Customer ID',
 'Customer Name',
 'Segment',
 'Country',
 'City',
 'State',
 'Postal Code',
 'Region',
 'Product ID',
 'Category',
 'Sub-Category',
 'Product Name',
 'Sales',
 'Quantity',
 'Discount',
 'Profit']
Code Text

Gemini
import datetime as dt
sns.set(style="whitegrid")

Code Text

Gemini
# Convert date column
df['Order Date'] = pd.to_datetime(df['Order Date'])

# Remove duplicates
df.drop_duplicates(inplace=True)

# Fill or drop missing values
df['Customer ID'].fillna('Unknown', inplace=True)

/tmp/ipython-input-28-3988211300.py:8: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Customer ID'].fillna('Unknown', inplace=True)
Code Text

Gemini

EDA(Exploratory Data Analysis.)

Code Text

Gemini
#Histplot
sns.countplot(data=df, x='Ship Mode')
plt.title('Shipping Status Distribution')
plt.show()

Code Text

Gemini
df['Sales'].hist
sns.countplot(x='Year', data=df)
plt.show()


Code Text

Gemini
#sale overview
# Total revenue
df['Revenue'] = df['Quantity'] * df['Profit']
total_revenue = df['Revenue'].sum()

# Monthly sales trend
df['Month'] = df['Order Date'].dt.to_period('M')
monthly_sales = df.groupby('Month')['Revenue'].sum()

monthly_sales.plot(kind='line', figsize=(12,6), title='Monthly Revenue')
plt.ylabel('Revenue')
plt.show()

Code Text

Gemini
#Top product
top_products = df.groupby('Product Name')['Revenue'].sum().sort_values(ascending=False).head(10)
top_products.plot(kind='barh', title='Top 10 Products by Revenue')
plt.xlabel('Revenue')
plt.show()

Code Text

Gemini
#scatter plot
sns.scatterplot(x='Sales', y='Profit', data=df)
plt.show()

Code Text

Gemini
#correlation matrix
plt.figure(figsize=(10,6))
sns.heatmap(df[['Quantity', 'Profit', 'Revenue']].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

Code Text

Gemini
#Time series
df['Month'] = df['Order Date'].dt.to_period('M')
monthly_sales = df.groupby('Month')['Sales'].sum()

monthly_sales.plot(kind='line', figsize=(12,6), title='Monthly Sales Trend')
plt.ylabel('Sales')
plt.show()

Code Text

Gemini
g=sns.PairGrid(df)
g=g.map_upper(sns.scatterplot)
g=g.map_lower(sns.kdeplot)
g=g.map_diag(sns.kdeplot,shade=True)
Code Text

Gemini
Code Text

Variables Terminal
Add a comment